2  Calculating neighborhood densities

2.1 Overview

This tutorial follows from the Section 2. “Modeling considerations when estimating CDD” of the main text:

Section 2. “Modeling considerations when estimating CDD”

For survival and growth models, defining the density metric and the size of local neighborhoods also require consideration. The density of conspecific and heterospecific neighbors surrounding a focal individual (or plot) can be measured within different distances (i.e., radii). Within a radius, neighbors should be weighted in the density summation with regard to their size and distance to the focal individuals, with the rationale that neighbors at a greater distance and of smaller size have smaller effects. Typically, weighting involves dividing the basal area or diameter by a linear function or an exponential function of distance to allow competitive effects of an individual of a given size to decline with distance (Uriarte et al. 2010; Canham et al. 2006). Biologically meaningful radii should be tested based on the size/life stage, vital rate of interest, and the likely agents of density dependence in the system. For a system of interest, we suggest comparing models with multiple distances (e.g., using maximum likelihood or information criterion) because forests may differ in how neighbors influence performance.

Here, we demonstrate how to calculate the density of conspecific, heterospecific, and all neighbors surrounding a focal individual or plot for a given distance when XY coordinates are known. As a result, this section requires that the data set contains the location of mapped stems. If that is not available (as is common in seedling studies or smaller plots), continue to part 2 of this appendix.

We then demonstrate how to weight the calculation of neighborhood density by individual size (i.e., basal area) and distance using an exponential decay function (one of many possible decay functions), allowing the competitive effects of density effects to saturate (Uriarte et al. 2010; Canham et al. 2006).

To assess which shape parameters of the exponential decay function is most appropriate for the data set, we fit models with multiple combinations of decay function values and compare models using log likelihood.

From a computational perspective, this approach can be relatively resource intensive both in terms of time and object size. It’s possible to make this approach more efficient by subdividing the data (e.g., by plot) or using more efficient data structures such as data.table.

We also note that alternative approaches allow the estimation of the effective scale of neighborhood interactions directly from data (Barber et al. 2022). An excellent case study using Stan is available here.

Note

The following code is adapted from the latitudinalCNDD repository by Lisa Hülsmann.

2.2 Load libraries and data

Code
# Load libraries
library(tidyr)
library(dplyr)
library(ggplot2)
library(parallel)
library(here)
library(spatstat.geom)
library(mgcv) # For fitting gams
library(lubridate) # For calculating census intervals
library(broom) # For processing fit models
library(purrr)
library(kableExtra) # For table styling

2.3 Data format explanation

For this tutorial, we will be using an example data set from Barro Colorado Island (BCI) (available here) that includes 7,028 observations, of 3,771 individuals of 16 species across two census intervals with the 50 ha BCI 50 ha forest dynamics plot (more information here). Each stem is individually mapped, which allows us to calculate neighborhood density across different distance thresholds.

The code below assumes the data is in a format where each row is an observation for an individual from a census. For this data set, the column descriptions are as follows:

  • treeID: unique identifier for each tree

  • sp: species code

  • gx: spatial coordinate on x axis

  • gy: spatial coordinate on y axis

  • dbh: diameter at breast height (mm)

  • ba: basal area (m^2)

  • status: status at census, A = alive, D = dead

  • date: date of observation

  • census: census number

  • surv: survival status at census, 1 = alive, 0 = dead

  • surv_next: survival status at next census, 1 = alive, 0 = dead

  • mort: mortality status at census, 1 = dead, 0 = alive

  • mort_next: mortality status at next census, 1 = dead, 0 = alive

  • interval: time interval between censuses in years

Let’s take a quick look at the data set we’ll be working with:

Code
head(dat, n = 5)
# A tibble: 5 × 14
  treeID sp        gx    gy   dbh     ba status date        surv surv_next  mort
  <chr>  <chr>  <dbl> <dbl> <dbl>  <dbl> <chr>  <date>     <dbl>     <dbl> <dbl>
1 3884   ceibpe  296.  24.8  1500 1.77   A      1981-11-10     1         1     0
2 3998   cordbi  280.  45.4   281 0.0620 A      1981-12-11     1         1     0
3 4065   ceibpe  289. 266.   2000 3.14   A      1982-01-08     1         1     0
4 4070   cordbi  283. 287.    356 0.0995 A      1982-01-08     1         1     0
5 4119   ceibpe  258. 224.   1600 2.01   A      1981-12-13     1         1     0
# ℹ 3 more variables: mort_next <dbl>, interval <dbl>, census <dbl>

We can produce a plot Figure 2.1 of the tree locations, where the size of the point is scaled to basal area and colored by species, with each census displayed as a panel:

Code
ggplot(dat, aes(x = gx, y = gy, size = ba, col = sp)) + 
  geom_point() + 
  facet_wrap(~census) +
  theme_bw(10) + 
  theme(legend.position = "right") + 
  labs(size = "Basal area", col = "Species")

Figure 2.1: Map of tree locations by census

2.4 Define exponential decay function

We will demonstrate how to calculate neighborhood densities using an exponential decay function. In principle, it’s possible to use any number of different decay functions and to explore various ranges of values that determine the rate of decay. Here, we present the general framework for how one might explore different scenarios, though the ultimate choice of decay function and parameter values will depend on the study. Note that this method of calculating neighborhood density is only possible when neighbors’ x-y coordinates are known.

Code
exponential_decay <- function(mu, distance){
  return(exp(-(1/mu * distance)))
}

Let’s see what the exponential decay function looks like across a range of mu values (Figure 2.2):

Code
# Set range of mu values and distances
decay_values <- seq(from = 1, to = 25, by = 2)

# Use sprintf to add leading 0s, will help with sorting later on
decay_names = paste("exp", sprintf("%02s", decay_values), sep = "") 

distances <- seq(1, 100, 1)

# Generate a dataframe with each combination of mu and distance
example_decay_values <- expand_grid(decay_values, distances) %>%
                        # Rename columns
                        rename(decay_value = decay_values, 
                               distance = distances)

# Evaluate distance decay function for each combination of 
# mu and distance
example_decay_values$decay <- exponential_decay(
                              mu = example_decay_values$decay_value,
                              distance = example_decay_values$distance)

# Plot results
ggplot(example_decay_values, 
       aes(x = distance, y = decay, 
           color = decay_value, group = decay_value)) + 
  ylab("Density") + 
  xlab("Distance (m)") + 
  scale_color_continuous(name = "Value of \ndecay constant") + 
  geom_line() + 
  theme_bw(12)

Figure 2.2: Plot of decay function

2.4.1 Determine which trees are at edge of plot

Trees near the edge of plot boundaries have incomplete information about their neighborhood, because trees outside of the plot boundaries are not mapped. Typically trees within certain distance threshold from the plot edge are excluded from analysis, but still included in calculations of neighborhood densities.

We are going to add a column to our data set called ‘edge’ that is TRUE if within a set distance to the edge of the plot.

For this example data set, the dimensions of the overall plot are 300 x 300 m, ranging from 0-300 on both the x and y axis, and representing a subset of the overall 50 ha forest dynamics plot at BCI.

Code
# Set threshold for distance to edge
distance_threshold_edge = 30

# Add in min and max values for corners of plot
min_x <- 0
max_x <- 300
min_y <- 0
max_y <- 300

dat$edge = dat$gx < min_x + distance_threshold_edge |
           dat$gx > max_x - distance_threshold_edge |
           dat$gy < min_y + distance_threshold_edge |
           dat$gy > max_y - distance_threshold_edge

# How many trees fall within the edge threshold?
table(dat$edge)

FALSE  TRUE 
 4526  2502 

Below is a plot Figure 2.3 of tree locations colored by whether they fall within the edge threshold or not, separated out for each census.

Code
ggplot(dat, aes(x = gx, y = gy, col = edge)) + 
  geom_point() + 
  facet_wrap(~census) +
  coord_fixed(ratio = 1) + 
  theme_bw(12)

Figure 2.3: Map of tree locations colored by whether they fall within the edge threshold

2.5 Calculate distances among focal individual and neighbors

Next, we will calculate distances among individuals in our plot, using an upper threshold distance of what to consider a neighbor.

We use the ‘spatstat.geom’ package to efficiently calculate distances among individuals.

Because this example only includes one census interval (two censuses total), we will subset down to just the first census to calculate neighborhood density.

If the data set were to contain multiple census intervals, it would be necessary to calculate neighborhood density separately for each census interval, using only the individuals that were alive at the beginning of that census interval.

Code
# Set distance threshold for considering neighbors
distance_threshold_neighborhood <- 30

# Subset to first census - this will be different for different 
# datasets
dat_first_census <- dat %>%
                    filter(census == 1)

# Format into 'ppp' object
dat_ppp = spatstat.geom::ppp(dat_first_census$gx, dat_first_census$gy, 
                             window = owin(range(dat$gx), 
                                           range(dat$gy)), 
                                           checkdup = F)

# Determine close pairs based on distance threshold
# Returns a list that we convert to tibble later
neighbors = spatstat.geom::closepairs(dat_ppp, 
              rmax = distance_threshold_neighborhood, # Max radius
              what = "ijd", # return indicies i, j, and distance 
              twice = TRUE)

# Convert to dataframe
neighbors <- as_tibble(neighbors)

# Take a peek at the data
# i = index of focal individual
# j = index of neighbor
# d = distance to neighbor
head(neighbors)
# A tibble: 6 × 3
      i     j     d
  <int> <int> <dbl>
1  3227  3252 27.0 
2  3227  3238 17.1 
3  3227  3240  5.75
4  3227  3229 18.6 
5  3227  3253 24.4 
6  3227  3222 20.8 

Next we add in additional columns for neighbor characteristic, (e.g., species, size) and whether they are located near the edge of the plot (blue area in Figure 2.3).

Code
# add additional columns

# Add species for individual i
neighbors$sp_i = dat_first_census$sp[neighbors$i] 

# Add whether individual i is near edge
neighbors$edge_i = dat_first_census$edge[neighbors$i] 

# Add species for individual j
neighbors$sp_j = dat_first_census$sp[neighbors$j] 

# Add basal area of individual j
neighbors$ba_j = dat_first_census$ba[neighbors$j] 

We then want to add a column that indicates whether the comparison between the focal individual and the neighbor is conspecific or heterospecific because we are interested separately estimating the densities of conspecifics and heterospecifics.

Code
neighbors$comparison_type <- ifelse(neighbors$sp_i == neighbors$sp_j,
                                    yes = "con", # conspecific
                                    no = "het") # heterospecific

We then remove focal trees that are too close to the edge of the plot

Code
# remove focal trees i that are at the edge
neighbors = neighbors[!neighbors$edge_i, ]

Next, we add columns to our neighbors data set that indicates the distance decay multiplier and the distance decay multiplier weighted by basal area

Code
# Loop through distance decay values
for(x in 1:length(decay_values)){
  
  #  Add in column for distance decay multiplier for each decay value
  # add _ba suffix to column name - will eventually be summed based on
  # number of individual neighbors
  neighbors[, paste0(decay_names[x], "_N")] <- exponential_decay(mu = decay_values[x], 
                                                   distance = neighbors$d)
  
  # Weight distance decay multiplier by basal area of neighbor
  # add _ba suffix to column name
  neighbors[, paste0(decay_names[x], "_BA")] <- exponential_decay(
                                                mu = decay_values[x], 
                             distance = neighbors$d) * neighbors$ba_j
}

Depending on how many distance decay values are being investigated, there may be many columns in the data frame.

Code
head(neighbors)
# A tibble: 6 × 34
      i     j     d sp_i  edge_i sp_j     ba_j comparison_type  exp01_N exp01_BA
  <int> <int> <dbl> <chr> <lgl>  <chr>   <dbl> <chr>              <dbl>    <dbl>
1  2973  2993 14.3  cord… FALSE  capp… 1.57e-4 het             6.36e- 7 9.98e-11
2  2973  3122 20.4  cord… FALSE  des2… 4.91e-4 het             1.40e- 9 6.86e-13
3  2973  2974  5.86 cord… FALSE  des2… 7.85e-5 het             2.85e- 3 2.24e- 7
4  2973  3123 27.2  cord… FALSE  des2… 7.85e-5 het             1.47e-12 1.16e-16
5  2973  3121 14.5  cord… FALSE  mico… 1.77e-4 het             5.27e- 7 9.32e-11
6  2973  2936 13.4  cord… FALSE  des2… 7.85e-5 het             1.45e- 6 1.14e-10
# ℹ 24 more variables: exp03_N <dbl>, exp03_BA <dbl>, exp05_N <dbl>,
#   exp05_BA <dbl>, exp07_N <dbl>, exp07_BA <dbl>, exp09_N <dbl>,
#   exp09_BA <dbl>, exp11_N <dbl>, exp11_BA <dbl>, exp13_N <dbl>,
#   exp13_BA <dbl>, exp15_N <dbl>, exp15_BA <dbl>, exp17_N <dbl>,
#   exp17_BA <dbl>, exp19_N <dbl>, exp19_BA <dbl>, exp21_N <dbl>,
#   exp21_BA <dbl>, exp23_N <dbl>, exp23_BA <dbl>, exp25_N <dbl>,
#   exp25_BA <dbl>

2.6 Calculate neighborhood density

Then we summarize neighborhood density for each focal tree separately for conspecifics and heterospecifics.

Code
# Simple calculations of number of neighbors and total basal area, 
# ignoring distance decay
neighbors_summary <- neighbors %>%
                     group_by(i, comparison_type) %>%
                     summarise(nodecay_N = n(), # count of neighbors
                          nodecay_BA = sum(ba_j)) # sum of basal area)

# Add in decay columns
neighbors_summary_decay <- neighbors %>%
                            group_by(i, comparison_type) %>%
                      # Select only columns related to distance decay
                            select(starts_with("exp")) %>% 
                      # Summarize them all by summing columns
                            summarise_all(sum) 

# Join both together
neighbors_summary <- left_join(neighbors_summary, 
                               neighbors_summary_decay,
                               by = c("i", "comparison_type"))

# Add treeID column
neighbors_summary$treeID<-dat_first_census$treeID[neighbors_summary$i]


   
# If there are any focal individuals with no neighbors, add values 
# of 0 for neighborhood densities 
      noNeighbors = dat_first_census$treeID[!dat_first_census$treeID 
                                      %in% neighbors_summary$treeID &
                                          !dat_first_census$edge]
      
      # If there are individuals with no neighbors
      if (length(noNeighbors) > 0) {
        neighbors_summary = bind_rows(neighbors_summary, 
                                      expand_grid(i = NA, 
                                             treeID = noNeighbors, 
                             comparison_type = c("het", "cons"))) %>%
                            # Add 0s where NA
                            mutate_all(replace_na, replace = 0) 
        }

# Take a peak at the data
head(neighbors_summary)
# A tibble: 6 × 31
# Groups:   i [6]
      i comparison_type nodecay_N nodecay_BA exp01_N   exp01_BA exp03_N exp03_BA
  <int> <chr>               <int>      <dbl>   <dbl>      <dbl>   <dbl>    <dbl>
1     5 het                   115     0.487  0.0991     4.05e-5   2.03  0.00172 
2     8 het                    94     0.0539 0.181      3.63e-5   1.21  0.000703
3     9 het                   151     2.17   0.0464     1.67e-4   1.65  0.00287 
4    10 het                    76     0.0509 0.00131    4.72e-7   0.464 0.000175
5    11 het                    59     0.0631 0.00859    1.99e-6   0.493 0.000277
6    12 het                    93     0.0506 0.0441     9.51e-6   1.57  0.000745
# ℹ 23 more variables: exp05_N <dbl>, exp05_BA <dbl>, exp07_N <dbl>,
#   exp07_BA <dbl>, exp09_N <dbl>, exp09_BA <dbl>, exp11_N <dbl>,
#   exp11_BA <dbl>, exp13_N <dbl>, exp13_BA <dbl>, exp15_N <dbl>,
#   exp15_BA <dbl>, exp17_N <dbl>, exp17_BA <dbl>, exp19_N <dbl>,
#   exp19_BA <dbl>, exp21_N <dbl>, exp21_BA <dbl>, exp23_N <dbl>,
#   exp23_BA <dbl>, exp25_N <dbl>, exp25_BA <dbl>, treeID <chr>

As described in the main text, it can be advantageous to use total density that includes both conspecific and heterospecific density as a covariate, rather than only heterospecific density.

Here, we calculate overall density by summing heterospecific and conspecific densities.

Code
# First convert to long format which will make it easy to sum across 
# heterospecific and conspecific values
neighbors_summary_long_format <-  neighbors_summary %>%
                                  pivot_longer(cols = -c("i", "comparison_type", "treeID"))

# Sum across heterospecific and conspecific values and rename to 'all'
neighbors_total_long_format <- neighbors_summary_long_format %>%
                                group_by(i, treeID, name) %>%
                                summarize(value = sum(value)) %>%
                                mutate(comparison_type = "all")

# Bind together conspecific and 'all' densities
# remove heterospecific columns
# fill in 0s where there are no neighbors
neighbors_summary_total = bind_rows(neighbors_summary_long_format,
                                    neighbors_total_long_format) %>%
                        # Can filter out heterospecific neighborhood 
                        # values by uncommenting this line of the
                        # objects become too large
                        #  filter(comparison_type != "het") %>% 
                mutate(name = paste0(comparison_type, "_", name)) %>%
                select(-comparison_type) %>%
                pivot_wider(names_from = name, values_from = value, 
                            values_fill = 0)

2.7 Model mortality as a function of neighborhood density

To determine the ‘best’ decay parameter to use, we fit species-specific mortality models using Generalized Additive Models GAMs and compare models based on log likelihoods. GAMs are flexible statistical models that combine linear and non-linear components to capture complex relationships in data.

We first create our data set that we will use in the GAMs, subsetting down to just one census interval (because our example dataset only has 2 censuses) and removing trees close to the edge. In other datasets, you may have multiple census intervals, where it would be common practice to include ‘year’ or ‘census’ as a random effect in the model, but otherwise the overall approach is similar.

Code
# Join census data with neighborhood data
dat_gam <- left_join(dat_first_census,
                               neighbors_summary_total,
                               by = "treeID")

# Remove edge trees
dat_gam <- dat_gam %>%
                    filter(edge == FALSE)

For each species, we summarize data availability to help determine parameters for the GAM smooth terms.

Code
# Summarize data availability at species level to set the degree of 
# smoothness for GAM smooth terms
sp_summary <- dat_gam %>% 
  group_by(sp) %>% 
  summarise(ndead = sum(mort_next),
            nsurv = sum(surv_next),
            range_con_BA = max(con_nodecay_BA) - min(con_nodecay_BA),
            max_con_BA = max(con_nodecay_BA),
            unique_con_BA = length(unique(con_nodecay_BA)),
            unique_all_BA = length(unique(all_nodecay_BA)),
            range_con_N = max(con_nodecay_N) - min(con_nodecay_N),
            max_con_N = max(con_nodecay_N),
            unique_con_N = length(unique(con_nodecay_N)),
            unique_all_N = length(unique(all_nodecay_N)),
            unique_dbh = length(unique(dbh))
  )

# Filter out species that have 0 or 100% mortality
sp_summary <- sp_summary %>%
  filter(ndead > 0) %>%
  filter(nsurv > 0)

In this long block of code, we loop over all possible combinations of decay values for neighborhood densities weighted by abundance (N) and size (BA) for each species and fit a separate GAM for each model. For each GAM, we assess whether the model was able to be fit, and if it was able to be fit, whether it converged and for potential warnings. We save the results of successful model fits into a list that we will process later.

For large datasets where individual GAMs take a long time to run, the code could be modified to run in parallel, either locally on a personal computer or across a computing cluster.

Code
# Initialize list that will save model outputs
res_mod <- list()


# Model run settings
run_settings <- expand_grid(species = unique(sp_summary$sp),
                            decay_con = c("nodecay", decay_names),
                            decay_total = c("nodecay", decay_names),
                            nhood_data_type = c("N", "BA"))


# Loop through model run settings
for(run_settings_row in 1:nrow(run_settings)){
  
  # Extract values from run settings dataframe
  species <- run_settings$species[run_settings_row]
  decay_con <- run_settings$decay_con[run_settings_row]
  decay_total <- run_settings$decay_total[run_settings_row]
  nhood_data_type <- run_settings$nhood_data_type[run_settings_row]

  # Subset down to just focal species
  dat_subset <- dat_gam %>%
    filter(sp == species)

  # Set run name
  run_name <- paste0(species, "_total", decay_total,"_con", 
                     decay_con, "_", nhood_data_type)
  
  # Print status if desired
  # cat("Working on run: ", run_name, " ...\n")

  # Create model formula
  # Initial DBH included as predictor variable
  form =  paste0("mort_next ~ s(dbh, k = k1) + s(all_", decay_total, 
                              "_", nhood_data_type, 
                              ", k = k2)  + s(con_", decay_con, 
                              "_", nhood_data_type, 
                              ", k = k3)")
    
  # Convert into formula
  form <- as.formula(form)
    
    # Choose penalties for model fitting
    # set to default 10 (the same as -1)
    # The higher the value of k, the more flexible the smooth term becomes, allowing for more intricate and wiggly patterns. Conversely, lower values of k result in smoother and simpler representations.
    k1 = k2 = k3 = 10
    if (k1 > sp_summary$unique_dbh[sp_summary$sp == species]) { 
      k1 = sp_summary$unique_dbh[sp_summary$sp == species] - 2
    }
    if (k2 > sp_summary$unique_all_N[sp_summary$sp == species]) {
      k2 = sp_summary$unique_all_N [sp_summary$sp == species]- 2
    }
    if (k3 > sp_summary$unique_con_N[sp_summary$sp == species]) { 
      k3 = sp_summary$unique_con_N[sp_summary$sp == species] - 2 
    }
    

   # Fit model
   # wrap in a try function to catch any errors
   mod = try(gam(form,
            family = binomial(link=cloglog),
            offset = log(interval),
            data = dat_subset,
            method = "REML"), 
          silent = T)

   
    # Check if model was able to fit
    if (!any(class(mod) == "gam")) {
      # print(paste("gam failed for:", run_name))
    } else {

    # Check if gam converged
    if (!mod$converged) {
      # print(paste("no convergence for:", run_name))
    } else {
  
      # check for complete separation
      # https://stats.stackexchange.com/questions/336424/issue-with-complete-separation-in-logistic-regression-in-r
      # Explore warning "glm.fit: fitted probabilities numerically 0 
      # or 1 occurred"
      eps <- 10 * .Machine$double.eps
      glm0.resids <- augment(x = mod) %>%
        mutate(p = 1 / (1 + exp(-.fitted)),
               warning = p > 1-eps,
               influence = order(.hat, decreasing = T))
      infl_limit = round(nrow(glm0.resids)/10, 0)
      # check if none of the warnings is among the 10% most 
      # influential observations, than it is okay..
      num = any(glm0.resids$warning & glm0.resids$influence < infl_limit)
      
      # complete separation
      if (num) {
       # print(paste("complete separation is likely for:", run_name))
      } else {
        
        # missing Vc
        if (is.null(mod$Vc)) {
         # print(paste("Vc not available for:", run_name))
        } else {
        
          # Add resulting model to list if it passes all checks
          res_mod[[run_name]] <- mod
          
        } # Vc ifelse
      } # complete separation ifelse
    } # convergence ifelse
  } # model available ifelse
} # end run settings loop

2.8 Summarize model fits

Next, we will extract regression coefficients into a dataframe using broom::tidy()

Code
# Extract coefficients for each model into a list
coefs = lapply(res_mod, broom::tidy) 

# Add a column for model run to each object in the list
coefs = Map(cbind, coefs, run_name = names(coefs)) 
coefs = do.call(rbind, coefs) # Bind elements of list together by rows
rownames(coefs) <- NULL # Remove row names
coefs <- coefs %>%
          select(run_name, everything()) # Rearrange columns

# Take a look at the data
knitr::kable(head(coefs), digits = 2, booktabs = T) %>%
  kable_styling(latex_options = c("striped", "scale_down"))
run_name term edf ref.df statistic p.value
cappfr_totalnodecay_connodecay_N s(dbh) 1.00 1.00 2.46 0.12
cappfr_totalnodecay_connodecay_N s(all_nodecay_N) 1.23 1.43 1.25 0.31
cappfr_totalnodecay_connodecay_N s(con_nodecay_N) 1.00 1.00 1.84 0.17
cappfr_totalnodecay_connodecay_BA s(dbh) 4.36 5.31 9.83 0.09
cappfr_totalnodecay_connodecay_BA s(all_nodecay_BA) 1.00 1.00 0.36 0.55
cappfr_totalnodecay_connodecay_BA s(con_nodecay_BA) 1.00 1.00 3.04 0.08

Next we will extract model summaries for each model with broom::glance() that provides key information like degrees of freedom, log likelihood, AIC, etc.

Code
# Extract summaries for each model into a list
sums = lapply(res_mod, broom::glance) 

# Add a column for model run to each object in the list
sums = Map(cbind, sums, run_name = names(sums)) 
sums = do.call(rbind, sums) # Bind elements of list together by rows
rownames(sums) <- NULL # Remove row names

# Separate run name into columns for species, decay, and density type
sums <- sums %>%
        separate(run_name, into = c("sp", "decay_total", 
                                    "decay_con", "density_type"), 
                 remove = FALSE)

# Remove 'total' and 'con' from decay columns
sums$decay_total <- gsub("total", "", sums$decay_total)
sums$decay_con <- gsub("con", "", sums$decay_con)

# Rearrange columns  
sums <- sums %>%
          select(run_name, sp, decay_total, decay_con, density_type, 
                 everything()) 

# Take a look at the model summaries
knitr::kable(head(sums), digits = 2) %>%
  kable_styling(latex_options = c("striped", "scale_down"))
run_name sp decay_total decay_con density_type df logLik AIC BIC deviance df.residual nobs
cappfr_totalnodecay_connodecay_N cappfr nodecay nodecay N 4.23 -22.62 54.10 70.37 45.23 285.77 290
cappfr_totalnodecay_connodecay_BA cappfr nodecay nodecay BA 7.36 -17.21 51.03 81.52 34.41 282.64 290
cappfr_totalexp01_connodecay_N cappfr exp01 nodecay N 5.06 -22.64 56.52 77.14 45.29 284.94 290
cappfr_totalexp01_connodecay_BA cappfr exp01 nodecay BA 8.08 -17.54 53.90 88.44 35.07 281.92 290
cappfr_totalexp03_connodecay_N cappfr exp03 nodecay N 8.07 -17.87 54.98 90.30 35.73 281.93 290
cappfr_totalexp03_connodecay_BA cappfr exp03 nodecay BA 7.93 -17.62 53.85 87.98 35.24 282.07 290

Due to limited sample sizes, it is likely that GAMs will not fit for each species. For example, in the table below of summary data of species where models did not successfully fit/converge, there are several instances of where all individuals of the species survived during the census, leading to no mortality events to use in the model.

We need to exclude species without complete model runs from our overall calculations when looking for optimal decay parameters across the entire data set.

Code
# Tally up number of model runs by species and total decay values
table(sums$sp, sums$decay_total)
        
         exp01 exp03 exp05 exp07 exp09 exp11 exp13 exp15 exp17 exp19 exp21
  cappfr    28    28    28    28    28    28    28    28    28    28    28
  cordbi    27    28    28    28    28    28    28    28    28    27    27
  des2pa    28    28    28    28    28    28    28    28    28    28    28
  micone    28    28    28    28    28    28    28    28    28    28    28
  pentma    28    28    28    28    28    23    27    28    28    28    28
  stylst    28    28    28    28    28    28    28    28    28    28    28
        
         exp23 exp25 nodecay
  cappfr    28    28      28
  cordbi    28    28      28
  des2pa    28    28      28
  micone    28    28      28
  pentma    28    28      28
  stylst    28    28      28
Code
# get incomplete run-site-species combinations
run_counts_by_sp <- sums %>% 
                    group_by(sp) %>%
                    tally() %>%
                    # Join with overall species list
                    left_join(sp_summary %>% select(sp), ., by = "sp") 

# Get expected number of runs if all models work
expected_total_runs <- run_settings %>%
                       group_by(species) %>%
                       tally() %>%
                       pull(n) %>%
                       max()

# Save species names where they didn't have all expected combinations 
# of model runs
incomplete = run_counts_by_sp$sp[run_counts_by_sp$n < 
                                   expected_total_runs | 
                                   is.na(run_counts_by_sp$n)]

# Species with successful runs
knitr::kable(sp_summary[!sp_summary$sp %in% incomplete, ], 
             digits = 2) %>%
  kable_styling(latex_options = c("striped", "scale_down"))
sp ndead nsurv range_con_BA max_con_BA unique_con_BA unique_all_BA range_con_N max_con_N unique_con_N unique_all_N unique_dbh
cappfr 5 285 0.02 0.02 273 290 30 32 31 111 31
des2pa 174 1172 0.14 0.15 1335 1343 123 140 123 157 77
micone 38 28 0.00 0.00 48 66 12 12 13 55 11
stylst 3 101 0.02 0.02 91 104 13 13 14 68 30
Code
# Species without successful runs
knitr::kable(sp_summary[sp_summary$sp %in% incomplete, ], 
             digits = 2) %>%
  kable_styling(latex_options = c("striped", "scale_down"))
sp ndead nsurv range_con_BA max_con_BA unique_con_BA unique_all_BA range_con_N max_con_N unique_con_N unique_all_N unique_dbh
acalma 2 5 0.00 0.00 4 7 2 2 2 7 6
cordbi 7 36 0.25 0.25 36 43 10 10 11 38 28
pentma 12 21 0.00 0.00 20 33 6 6 6 31 11
sponra 3 15 0.08 0.08 12 18 5 5 4 17 16

2.9 Selecting optimum decay parameter values across all species

We then summarize different model criteria across all species runs. To look for the optimal value for decay parameters, we sum log likelihoods across all species for a given decay parameter combination and choose the resulting parameter combination with the highest summed log likelihood.

Code
sums_total <- sums %>% 
              filter(!sp %in% incomplete) %>%
              group_by(decay_total, decay_con, density_type) %>%
              summarise(nvalues = n(),
                        sumlogLik = sum(logLik),
                        meanlogLik = mean(logLik)) %>%
              arrange(decay_total, decay_con, density_type)

sums_total
# A tibble: 392 × 6
# Groups:   decay_total, decay_con [196]
   decay_total decay_con density_type nvalues sumlogLik meanlogLik
   <chr>       <chr>     <chr>          <int>     <dbl>      <dbl>
 1 exp01       exp01     BA                 4     -556.      -139.
 2 exp01       exp01     N                  4     -554.      -138.
 3 exp01       exp03     BA                 4     -553.      -138.
 4 exp01       exp03     N                  4     -558.      -140.
 5 exp01       exp05     BA                 4     -554.      -138.
 6 exp01       exp05     N                  4     -561.      -140.
 7 exp01       exp07     BA                 4     -557.      -139.
 8 exp01       exp07     N                  4     -563.      -141.
 9 exp01       exp09     BA                 4     -558.      -140.
10 exp01       exp09     N                  4     -564.      -141.
# ℹ 382 more rows

We create a heatmap plot Figure 2.4 of summed log likelihoods for all parameter combinations across all species, with the optimal parameter combination marked with an X

Code
# Find optimum value separately for N and BA
  optimum <- sums_total %>%
               group_by(density_type) %>%
               slice_max(sumlogLik)
  
# Plot heatmap of log likelihood values
  ggplot(sums_total, aes(x = decay_total, y = decay_con, 
                         fill = sumlogLik)) +
    geom_tile(width = 0.9, height = 0.9, col = "black") + 
    scale_fill_viridis_c() + # viridis color palette
    geom_label(data = optimum, label = "X") + 
    labs(x = "Decay total density", y = "Decay conspecific density", 
         fill = "sumlogLik") + 
    facet_wrap(~density_type, ncol = 1) + 
    theme_bw(12) + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, 
                                     hjust=1))

Figure 2.4: Heatmap of optimal values for decay constants

For this data set, the following are the optimal decay parameter values across all species separately for neighborhood density calculated by abundance (N) and by basal area (BA)

Code
knitr::kable(optimum, digits = 2) %>%
  kable_styling(latex_options = c("striped", "scale_down"))
decay_total decay_con density_type nvalues sumlogLik meanlogLik
exp19 exp25 BA 4 -541.69 -135.42
exp03 exp01 N 4 -541.59 -135.40

2.10 Selecting optimum decay parameter values separately for each species

Alternatively, it may be useful to determine optimum decay parameters on a species by species basis. To do this, we find the maximum log liklihood for each species separately across decay parameter combination and choose the resulting parameter combination with the highest log likelihood.

Code
sums_total_by_spp <- sums %>% 
              filter(!sp %in% incomplete) %>%
              group_by(decay_total, decay_con, density_type, sp) %>%
              summarise(nvalues = n(),
                        sumlogLik = sum(logLik),
                        meanlogLik = mean(logLik)) %>%
              arrange(decay_total, decay_con, density_type)

sums_total_by_spp
# A tibble: 1,568 × 7
# Groups:   decay_total, decay_con, density_type [392]
   decay_total decay_con density_type sp     nvalues sumlogLik meanlogLik
   <chr>       <chr>     <chr>        <chr>    <int>     <dbl>      <dbl>
 1 exp01       exp01     BA           cappfr       1    -16.6      -16.6 
 2 exp01       exp01     BA           des2pa       1   -489.      -489.  
 3 exp01       exp01     BA           micone       1    -43.7      -43.7 
 4 exp01       exp01     BA           stylst       1     -6.84      -6.84
 5 exp01       exp01     N            cappfr       1    -14.3      -14.3 
 6 exp01       exp01     N            des2pa       1   -489.      -489.  
 7 exp01       exp01     N            micone       1    -41.3      -41.3 
 8 exp01       exp01     N            stylst       1     -8.64      -8.64
 9 exp01       exp03     BA           cappfr       1    -20.7      -20.7 
10 exp01       exp03     BA           des2pa       1   -485.      -485.  
# ℹ 1,558 more rows

We create a heatmap plot Figure 2.5 of log likelihoods for all parameter combinations for each species separately, with the optimal parameter combination marked with an X. We only display the first three species here, because with data sets containing many species, it will be hard to visualize all the species on one graph and you may need to subdivide the plot further for visualization.

Code
  sums <- sums %>%
    filter(sp %in% c("cappfr", "cordbi", "des2pa")) %>%
    group_by(sp) %>%
    # Scale loglikihood by species to help with visualization
    mutate(logLik_scaled = scale(logLik)) %>% 
    ungroup()
 
# Find optimum value separately for N and BA
  optimum_by_sp <- sums %>%
               group_by(sp, density_type) %>%
               slice_max(logLik_scaled, with_ties = FALSE)

# Plot heatmap of log likelihood values
  ggplot(sums, aes(x = decay_total, y = decay_con,
                   fill = logLik_scaled)) +
    geom_tile(width = 0.9, height = 0.9, col = "black") +
    geom_label(data = optimum_by_sp, label = "X") +
    labs(x = "Decay total density", y = "Decay conspecific density", 
         fill = "logLik") +
    facet_wrap(~sp + density_type, ncol = 2, scales = "free") +
    scale_fill_viridis_c() + # viridis color palette
    theme_bw(12) + 
    theme(legend.position = "right") + 
    labs(fill = "Log likelihood\n(scaled)") + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, 
                                     hjust=1))

Figure 2.5: Heatmap of optimal values for decay constants separately for each species

For this data set, the following are the optimal decay parameter values for each species separately for neighborhood density calculated by abundance (N) and by basal area (BA):

Code
knitr::kable(optimum_by_sp, digits = 2) %>%
  kable_styling(latex_options = c("striped", "scale_down"))
run_name sp decay_total decay_con density_type df logLik AIC BIC deviance df.residual nobs logLik_scaled
cappfr_totalexp19_conexp25_BA cappfr exp19 exp25 BA 11.09 -8.74 43.29 90.63 17.48 278.91 290 2.9564926
cappfr_totalexp03_conexp01_N cappfr exp03 exp01 N 11.03 -11.80 49.64 97.43 23.59 278.97 290 2.1291275
cordbi_totalexp05_conexp03_BA cordbi exp05 exp03 BA 7.61 -12.46 43.10 58.90 24.93 34.39 42 0.2471958
cordbi_totalexp01_conexp07_N cordbi exp01 exp07 N 14.84 0.00 31.83 59.48 0.00 27.16 42 4.5489731
des2pa_totalexp25_conexp09_BA des2pa exp25 exp09 BA 8.36 -481.80 984.07 1037.31 963.61 1337.64 1346 1.7645474
des2pa_totalexp03_connodecay_N des2pa exp03 nodecay N 7.33 -481.92 981.23 1026.49 963.84 1338.67 1346 1.6992914